Skip to content

Conversation

@annop-w
Copy link
Contributor

@annop-w annop-w commented Jan 13, 2025

This PR introduces a new thread throttling profile for SGEMM on NEOVERSEV1 CPU core.
Benchmarking for M=N=K=[2, 1024] shows a geometric mean speedup of ~1.1x.

@annop-w
Copy link
Contributor Author

annop-w commented Jan 16, 2025

@martin-frbg Could I please get a review for this patch ?

@martin-frbg
Copy link
Collaborator

As soon as I can concentrate on it. Would you happen to have any benchmark numbers available ? The idea of using varying thread counts has been floated a few times in the past, but the added logic seemed to outweigh gains on x86_64 at least

@annop-w
Copy link
Contributor Author

annop-w commented Jan 16, 2025

@martin-frbg Thank you for your comment. As mentioned in the description, the geo. mean speedup is ~1.1 for M=N=K up to 1024. Here is a figure that shows the data on NEOVERSEV1 with 64 cores.
sgemm_v1

@martin-frbg
Copy link
Collaborator

Hmm, I don't really like the added complexity with yet another "hidden" table of parameters (I assume the V2 would need different thresholds), but what's not to like about the added performance...

@annop-w
Copy link
Contributor Author

annop-w commented Jan 20, 2025

Yes, V2 would need another set of parameters. I think the complexity is inevitable if we want optimal performance. We could try to somehow organize these tuning parameters, e.g. into files/folders. I am open to suggestions.

@martin-frbg martin-frbg added this to the 0.3.30 milestone Jan 23, 2025
@martin-frbg
Copy link
Collaborator

Merging to give it more exposure, but I think ideally the table part should be in (or closer to) param.h

@martin-frbg martin-frbg merged commit a54f9a9 into OpenMathLib:develop Jan 23, 2025
75 of 84 checks passed
@annop-w annop-w deleted the sgemm_throttling branch February 28, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants